New parameters: max_candidates, split_decay_rate, delete_leaves. split_try has new behaviour #60
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some Enhancements/modifications.
Three new parameters added
max_candidates(default=50):After many splits the number of split-option for which one needs to calculate the resulting loss explodes. Now the number of possible options is changed from (t_try * possible options) to max(max_candidates , t_try * possible options ).
With this change the
splitsparameter can be set much higher because computational cost now only grows linearly instead of quadratic.split_decay_rate(default=0.1):Possible splits are initiated with age=0. Whenever a possible split becomes a split_candidate (i.e. it has been drawn when drawing max(max_candidates , t_try * possible options ) times) it ages by +1. The age of the single split-candidate that has actually minimal loss is reset to zero. A high decay rate means faster aging.
split_decay_rate=0 results in no aging and therefore mimics the old behaviour.delete_leaves(default=1):Originally if a leaf is split with respect to a variable that is already part of the leaf, then the leaf is deleted/replaced by the newly created children. This is also the default behaviour now.
Additionally, one can choose
delete_leaves=0. In this case leaves are never deleted. I.e., a split results always in two new leaves while still keeping the parent.New behaviour of split_try
Before, split_try was determining how many split points to try out in each leaf of every split_candidate. Therefore the number of splits that have been tried out in a split_candidate was (number of leaves in split_candidate)*
split_try.With this change in every split_candidate,
split_trycombinations of leaves and split points are chosen. Hence the number of splits that are being tried out in a split_candidate is reduced to be justsplit_try.